Evaluation of a branch target address cache
نویسندگان
چکیده
Branches interrupt the sequential flow of instructions and introduce pipeline bubbles. Branch penalty can be a significant component of effective cpi (cycles per instruction) in multiple instruction issue processors. Two key issues need to be resolved to alleviate this problem: a branch resolution scheme to decide the direction and target of a branch early in the pipeline, thus allowing target instruction fetch to start, and mechanisms to minimize the impact of unpredictable branches. We propose a technique of cacheing branch target addresses for our fully predicated processor architecture, that would allow the branch decision to be made in the fetch stage of the pipeline. We discuss the impact of different branch target cacheing policies and cache sizes on the efficiency of branch target address cache. Impact of register-relative branches which may have variable target addresses is considered and a so lution is suggested.
منابع مشابه
Effectiveness of microarchitecture test program generation - Design & Test of Computers, IEEE
FSM Models As formulated in prior work, Figure 2 shows the FSM model for each of the 512 branch history table entries.15-17 A cold start initializes all entries in the branch history table to the start state, strong not taken. Any conditional branch whose address directly maps to the same branch history table entry will cause transitions in that entry’s FSM when the branch is resolved in the ex...
متن کاملDon't Use the Page Number, but a Pointer to It
Most newly announced high performance microprocessors support 64-bit virtual addresses and the width of physical addresses is also growing. As a result, the size of the address tags in the L1 cache is increasing. The impact of on chip area is particularly dramatic when small block sizes are used. At the same time, the performance of high performance microprocessors depends more and more on the ...
متن کاملBranch Prediction Strategies Using Instruction Cache
Pipelining is the major organizational technique that computers use to achieve high performance. Ideally, a pipeline uniprocessor can run at a rate that is limited by its slowest stage. Branches in the instruction stream disrupt the pipeline, by stalling and/or ushing of the pipeline, and reduce the processor performance well below ideal. Since branch instructions constitute a signiicant percen...
متن کاملReducing Branch Delay to Zero in Pipelined Processors
A mechanism to reduce the cost of branches in pipelined processors is described and evaluated. It is based on the use of multiple prefetch, early computation of the target address, delayed branch, and parallel execution of branches. The implementation of this mechanism using a Branch Target Instruction Memory is described. An analytical model of the performance of this implementation is present...
متن کاملOmitting Cache Look-Up for High-Performance, Low-Power Microprocessors
In this paper, we propose a novel architecture for low-power direct-mapped instruction caches, called “historybased tag-comparison (HBTC) cache”. The cache attempts to reuse tag-comparison results for avoiding unnecessary tag checks. Execution footprints are recorded into an extended BTB (Branch Target Buffer). In our evaluation, it is observed that the energy for tag comparison can be reduced ...
متن کامل